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Abstract 

In this paper, we study the problem of designing proxies (or portfolios) for various stock 
market indices based on historical data. We use four different methods for computing market 
indices, all of which are formulas used in actual stock market analysis. For each index, we 
consider three criteria for designing the proxy: the proxy must either track the market index, 
outperform the market index, or perform within a margin of error of the index while maintaining 
, a low volatility. In eleven of the twelve cases (all combinations of four indices with three criteria 

except the problem of sacrificing return for less volatility using the price-relative index) we show 
that the problem is NP-hard, and hence most likely intractable. 

> . 

1 Introduction 

Market indices are widely used to track the performance of stocks or to design investment portfolios 
. This paper initiates a rigorous mathematical study of the computational complexity of the art of 
designing proxies for such indices. There are several results on selecting such proxies (or portfolios) 
in an on-line manner (see, for example, || and ||), but we look at off-line algorithms for designing 
proxies based on historical data. In particular, we show that, with one exception, all combinations 
of three fundamental problems (such as tracking or outperforming a full market index) with four 
commonly-used indices give NP-complete problems, so are computationally hard. We conjecture 
that the one remaining problem is also NP-complete, but do not have a proof at this time. 



To formally define market indices, let B be a set of b stocks in a market. Let Sn > be the 
price of the i-th stock at time t. Let Wi be the number of outstanding shares of the i-th. stock. 
We assume that Wi does not change with time. This paper discusses computational complexity 
issues regarding four kinds of market indices currently in use [jij. These indices are calculated by 
the following formulas, which can be multiplied by arbitrary constants to arrive at desired starting 
index values at time 0. 

• The price-weighted index of B at time t is 

$ 1 (B,t) = ^ = } Si ' t . (1) 



*An abstract appeared in the Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms, 
1999. 
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The Dow Jones Industrial Average is calculated in this manner for some B consisting of thirty 
stocks. 

• The value-weighted index of B at time t is 

The Standard & Poor's 500 is computed in this way with respect to 500 stocks. 

• The equal-weighted index of B at time t is 

b 

* 3 (B,t) = £ 



Si.t 



»=i 



>i,Q 



The index published by the Indicator Digest is calculated by this method, involving stocks listed 
on the New York Stock Exchange. 

• The price-relative index of B at time t is 



b Si,t 



The Value Line Index is computed by this formula. 

There are numerous reasons why stock investors and money managers would want to invest in 
a subset of stocks rather than those of a whole market GJ. For instance, small investors certainly 
do not have sufficient capital to invest in every stock in the market. Logically, such investors would 
attempt to choose a small subset of stocks which hopefully can perform roughly as well as or even 
outperform the market as a whole. They then face difficult trade-offs between returns and risks. 
For these and other reasons of optimization, we formulate three natural computational problems 
for the design of market indices. Given a market Ai consisting of m stocks, we wish to choose 
a subset Aik of at most k stocks and calculate an index of Aik, which is called a k-proxy of the 
corresponding index of the whole market Ai (we sometimes refer to Aik as a portfolio). Our goal 
is to choose Aik so that the resulting /c-proxy tracks or outperforms the corresponding index of 
Ai. This paper shows that designing proxies for the above four indices based on historical data is 
computationally hard. 

We note here that while our problem statements might sound rather restrictive since error 
bounds must be met for every time step, we can use simple padding arguments to extend all of our 
proofs to more relaxed problems of the form "can the error bound be met x percent of the time?" 

2 Problem Formulations 

In this section we formally define three basic problems related to selecting fc-proxies, or portfolios. 

Problem 1 (tracking an index) 

Input: A market Ai of m stocks, their prices Si t — f° r i = 0, . . . , /, their numbers Wi of 
outstanding shares, a real e\ > 0, an integer k > 0, and some j € {1,2,3,4} to indicate the 
desired type of index. 

Output: A subset Aik of at most k stocks in Ai such that 



$j(M k ,t) $j(M,t) 
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Problem 2 (outperforming an index) 

Input: A market M of m stocks, their prices Sn > for t = 0, . . . , /, their numbers Wi of 
outstanding shares, a real ei > 0, an integer k > 0, and some j G {1,2,3,4} to indicate the 
desired type of index. 

Output: A subset A4fc of at most k stocks in M such that 

For the final problem, we need a few extra definitions in order to analyze the volatility of a set 
of stocks. Let B be a set of stocks as defined in §[]]. 
• The one-period return of $j for B at time t > 1 is 



,u ^(s,t-i)' 



Rj(B,t) = In 



The average return of for ,B up to time i > 1 is 



Rj(B,t) 



t 

The volatility of <&j for # up to time t > 2 is 



E*=i (Ri(B,i) - Rj&t) 



t-l 



Problem 3 (sacrificing return for less volatility) 

Input: A market hA of m stocks, their prices Sn > for t = 0, . . . , /, their numbers Wi of 
outstanding shares, two reals a, /3 > 0, an integer /c > 0, and some j G {1, 2, 3, 4} to indicate 
the desired type of index. 

Output: A subset Aij, of at most k stocks in Ai such that 

**M k ,t) >a * j (M,t) foralH = 1 (4) 

Aj(A^ fc , s) < P-Aj(M, s) for all s = 2, . . . , /. (5) 
In this problem, (|j) is called the performance bound, and (|5|) is called the volatility bound. 

3 Price-weighted Index 

In this section, we consider taking the value of the market and portfolio using a price-weighted 
index, defined in ([j]). As given in the problem statements, we use the notation $i(A4,i) to denote 
the market average at timestep t, and $i(Mk,t) to denote the average of the portfolio at that 
timestep. 
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3.1 Tracking an index 



To solve the problem of tracking the market average, we need to satisfy (|2|) using function <&i(£>, t). 
We will refer to this bound as the "tracking bound." In the following proofs, we show this by 
proving an equivalent relation: 

* l( A4,0) M M ^) <l + e . (6) 
e " $i(M k ,0) $i(M,t) ~ W 



Theorem 3.1 Let e be any error bound satisfying < e < 1 and specified using bits in fixed 

point notation. Then the tracking problem for a price-weighted index with error bound e is NP-hard. 

In the remainder of this section, we prove this theorem by reduction from the minimum set 
cover problem. We will use the notation from the minimum cover definition given in the classic 
book on NP-completeness by Garey and Johnson Q: C is a collection of subsets of a finite set S, 
and K is the desired cover size. Specifically, we want a subcollection C'CC such that \C'\ < K 
and every item x 6 S is in some subset from C . 

Let n = |C|, and consider making an n x \S\ matrix in which each column corresponds to a 
fixed item from S, and each row corresponds to a subset S' € C. The element in row i, column j is 
some given value v\ if the element in S for that column is in the subset S", and value v$ if it is not. 
Then the minimum cover problem can be stated as follows: Is there a set of K rows such that the 
K x \S\ matrix defined using only those rows has at least one entry with value v\ in each column? 

It makes sense now to consider this n x IS"! matrix as an input to the portfolio selection problem, 
where each row corresponds to a stock and each column corresponds to a timestep, and we are to 
choose a portfolio of size k = K. Selecting a portfolio is then equivalent to selecting the subcollection 
in the minimum cover problem. A subcollection that is missing some item from S corresponds to a 
portfolio in which some timestep has all values equal to vo, and hence the portfolio average at that 
timestep must be vq. Ideally, we would select vq and v\ in such a way that the required tracking 
bound is met if any v\ values are included in the portfolio, but not if all values are vq. However, 
this simple construction has very unpredictable market averages at each time step, so we need a 
slightly more involved construction. 

We will introduce a new row into our matrix called the "adjustment row", and we will select 
values to adjust the column averages to predictable values. To guarantee that this row is not selected 
in our portfolio (so selections are made up entirely of rows from the minimum cover problem), we 
introduce a special column called the "control column" — any selection including our adjustment 
row will violate the error bound in that column, and no selection excluding that row will violate 
the bound. In addition, we need to pad the problem out substantially. This is accomplished by 
including rows that contain value vq in every non-control column, which is equivalent to padding 
the original set cover problem instance with empty subsets added to C. This clearly has no effect 
on the set cover problem. Finally, we insert a column of all ones to give the S^o values for the 
portfolio selection problem. The final matrix contains m = 3n rows, / = \S\ +2 columns, and is 
depicted in Figure |]. 

Note that since S^o = 1 for all i, $i(A4, 0) = &i(Mk, 0) = 1, and so @ reduces to just checking 
that 

$i(M k ,t) 
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Figure 1: Pictorial depiction of reduction for Theorem 3.1 



First we examine properties of the control column, where the values in that column are defined 



by 



1 - e 



ci = c + m. 



Lemma 3.1 The tracking bound is met for the control column if and only if the adjustment row is 
not included in the portfolio. 

Proof: From the values for Co and c%, it is clear that the average value of the control column is 
Co + 1. Since we will be examining the error of approximations relative to this average, we first 
note that we can bound (due to the ceiling involved in the definition of cq) 



e 1 

< r < e. 



1 + e c + 1 



(7) 



Any portfolio that does not include the adjustment row has average value Co, and so we can lower 
bound the relative error by 



<$>l(M k ,t) c 



§i(M,t) co + 1 



1 



1 



co + 1 



> 1 - e. 



Since the relative error is clearly less than one, it falls into the acceptable range of values. 

On the other hand, if a portfolio does include the adjustment row, then the portfolio average is 
Co + m/k, and so the relative error is 

§i(Mk,t) CQ + m/k ^ m/k — 1 



$i(M,t) co + 1 



co + 1 
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Due to our padding of the problem, we know that k < m/3, and so m/k — 1 > 2. Using this 
observation and the bound from (0) leads to the conclusion that 



$i{M k ,t) 
*i(M,t) 



> 1 + 



c + l 



> 1 + 



1 + e 



€>! + €. 



In other words, any portfolio that includes the adjustment row will not meet the required error 
bound. Combined with our previous observation, this completes the proof of the lemma. | 

Next we must define the values vq and v±, and show the equivalence of our portfolio selection 
instance with the original set cover instance. To do so, define 



A 



k- 



Vq + A. 



Note that since e < 1, all these values are clearly non-negative integers, as required by the portfolio 
selection problem. 

For column t, if there are Mt rows with value x>\, then the value we use in the adjustment row 
for that column is 

A t = v + (m - M t )A , 
which is clearly a positive integer, since Mt < m. The sum down the column is 

(m - M t - 1)vq + M t V! + A t = (m - M t - l)v + M t (v + A) + v + (m - M t )A 

= itivq + mA, 

which means that the column average is v$ + A, or just v\. Notice the independence from t. We 
make such an adjustment for every column in the matrix. 

We next demonstrate the equivalence of the produced portfolio selection instance with the 
original set cover instance. 

Lemma 3.2 The relative error bound is met if and only if the portfolio contains at least one v% 
value in each column (other than the control column). 

Proof: First, for the "only if" part of the lemma, consider the case where the relative error bound 
is met. Consider any specific column t of our table, and assume that this column does not contain 
any v\ values. By the last lemma, the adjustment row cannot be included in our portfolio, so all 
values must be vq, and so the portfolio average is exactly vq. Therefore, we can derive 

$i(A*,t) ^o + A l + A' U 

and so providing a good lower bound for — would in fact upper bound this ratio. We can do this 
as follows: 

A 



vo 



k + 


e 

1-e 




k±=z 

e 





U I 6 fc(l-e)+e 
^ 1— e 1— e 



k l=e + l k(l-e)+e \ _ e 
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Plugging back in to (Js|) , we get 



$i{M k ,t) 

*i(A4,*) M + T ^ 



< 



1 



e. 



Thus under our assumption that no v\ values are included, the error bound is not met. We conclude 
that if the error bound is met, then at least one v\ value must be included in each column. 

Next, for the "if" part of the theorem, assume that each column in the selected portfolio contains 
at least one v% value and that we have not selected the adjustment row. Since the market average is 
vi, and the largest possible selected value in the portfolio is v\, we know that <&\(M.k-, t) < $i(M,t), 
and so the upper bound 1 + e on the relative error is trivially met for any e > 0. 

Since we have selected at least one v\ value, the portfolio average is at least vq 
to lower bound the relative error notice that 



$l(M k ,t) > + % fc-1 



1 



a _i_ i • 

A + 1 



$l(M,t) ~ v + A ~ k 
Now we will derive a lower bound for ^ in a similar way to what we did above, so 



A/k, and so 
(9) 



A 





k^ 




k + 


£ 

1-e 



> 



k 



1-e 



fc(l-e) 
e 

fc(l-£) + l 

1-e 



1 



_ fc(l-e) 
e fc(l-e) + l' 



Using this bound, with a little manipulation we can derive 



fc-1 



1 



m. + 1 

A + L 



< 



1 (l-e)Jfe + l 



k (1 

fc+i 



+ e 



e. 



We can bound the middle factor of this bound by -j- by noticing that 



(l-e)Jfe + l k + 1 
< 



(l-e)k + e k 
and so plugging back into (^) we get 



1 - e)k 2 + k < (1 - e)k 2 + ek + (1 - e)k + e 



< e, 



> l 



l 



A + 1 



> 1 



fc-1 fc+1 



A; 



A; 



1 



k 2 



€ > 1 



We conclude that if at least one value in column t of the selected portfolio is v\ , then the relative 
error bound is met. Since we have completed both directions of the "if and only if" proof, this 
completes the proof of the lemma. | 

As a final note, it is fairly easy to show that all values in the constructed portfolio selection 
problem have length polynomial in the length of the original set cover problem and the number of 
bits used to specify e. Therefore, these values form a polynomial time reduction from the set cover 



problem to the portfolio selection problem, which completes the proof of Theorem 3.1 



3.2 Sacrificing Return for Less Volatility 

Next, we will skip Problem 2 and prove a hardness result for Problem 3: sacrificing return for less 
volatility. In the following section, we will return to problem 2, and show that the hardness of that 
problem (outperforming an index) follows directly from the results of this section. 

As in § |3.l| , we will show that Problem 3 is NP-complete by reducing the minimum cover problem 
to this one. 



7 



\ Col# 
RowX 1 2 3 4 5 



1 

2 

n 

n+1 
n+2 
n+3 




P P+l P+2 



P+\S\ 



v l v l v l v l v l Vi • • • Vj 

v l v l v l v l v l Vi • • • Vj 

v l v l v l V[ V, V, ■ • ■ V! 

V! A 2 ■ ■ ■ 

V! A 4 ■ ■ ■ 

Vi 00000---00 

v,00000---0A 



Coding Region 
v 2 if elem in set 
otherwise 



n coding rows 















y m-n padding rows 



Type-1 Column 
Type-2 Column 



J 



P+l padding columns 151 coding columns 



Figure 2: Construction for main reduction of Section p. 



3.2.1 The construction 

The main reduction for this proof involves a problem constructed from a minimum cover instance, 
and this construction is illustrated in Figure This constructed problem is an instance of our 
portfolio selection problem where the rows represent different stocks, the columns represent times, 
and the values in the matrix represent prices. 

In the original minimum cover instance, let n = \C\ represent the number of subsets in the 
input, let \S\ represent the size of the overall set, and let K be the number of subsets we are 
allowed to select. The data from this problem can be encoded into an n x |5| matrix M, where the 
values in this matrix are set as follows (v^ is a value that will be defined shortly): 



M, 



'j 



t>2 if subset i contains element j; 
otherwise. 



We will need a larger matrix in order to complete the reduction, so we embed matrix M into our 
larger matrix — in Figure || the embedded matrix is labeled as the "Coding Region" . This gives a 
portfolio selection problem with m stocks, f = P +\S\ time steps, and portfolio size k = K. 

We surround matrix M with various "padding rows" and "padding columns". The number of 
padding rows and padding columns are defined as follows: 

• There are P + l padding columns, where P = max (2(k + 1), 2151). 

• The total number of rows is defined in terms of the following constants: 

q = [max(l + (4//?),log fc (2/a))l , and B= \ak ql \ . 
The total number of rows is m = nB. 
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The definition of q implies some important properties of the constant B that we note here: 



B > 2; (10) 

B>ka>a. (11) 
Finally, from the first part of (|ll|) we can derive 

B 

a 



B k — 1 

>-—j—- 12 

a k 



All of the first n rows in the padding columns are filled with value v\, and value Vi is used in 
the coding region as previously described. These values are defined in terms of the constant B as 
follows: 

• vi=B -I 

• v 2 = k{B - 1) 

Each column may have an "adjustment value", denoted by At for column t. Odd numbered 
columns in the padding region (type-2 columns) do not have an adjustment value, but even num- 
bered columns other than column (type-1 columns) do, and these values are positioned at suc- 
cessively lower rows; therefore, if column t is a type-1 column, then At is placed in row n + |. If we 
run out of rows before completing this placement, simply put all remaining adjustment values on 
the last row. Notice that since P > 2(k + 1) there are at least k + 1 type-1 padding columns, and 
since the number of padding rows is (m — n) = (nB — n) > n > k + 1 (using (|To|)), there must be 
at least k + 1 distinct rows that contain adjustment values. Columns that cross the coding region 
(called "coding columns") also have adjustment values, which are all placed on the last row of the 
matrix (see Figure ^). The adjustment values to be used are defined below, where zt is the number 
of zeros in the coding region of column t: 




1J if < t < P and t is even; 

k) +z t -v 2 if t > P. 



Note that the adjustment values in the padding columns are all the same, but the adjustments 
in the coding region depend on the data in the coding region. Furthermore, (|Tl| ) guarantees that 
these adjustment values are all non-negative. 

Before analyzing the return and volatility of the constructed portfolio selection problem, we 
state the following lemma regarding the size of the constructed problem, showing that we have a 
polynomial reduction — the proof of this lemma is straight-forward given the above definitions, 
and is omitted. 

Lemma 3.3 If a and j3 are expressed using bits in fixed-point binary notation, and < a < 

and j3 = Q (^^), then the size of the constructed problem (including the size of the values 
in the matrix) is polynomial in the size of the original minimum cover problem. 



9 



3.2.2 Guarantees on Return 



Lemma 3.4 The performance bound is met for all columns if and only if the selected portfolio 
contains exactly k items from the coding rows and each coding column has at least one v% value 
from among the selected rows. 

Proof: We will first prove that if the selected portfolio contains exactly k items from the coding 
rows and each coding column has at least one i>2 value from the selected rows, then the performance 
bound is met. First consider a padding column t — since the k selected rows are all coding rows, all 
selected values for any padding column have value v\ , and so the portfolio average for that column 
is $i(M.k, t) = v\. On the other hand, the market average is different for the two types of columns. 
If column tisa type-1 padding column, then the sum of all the values in the column is 



B 








_ ct _ 





B 

a 



nv\ + A t = n(B — 1) + (m — n 

= n{B - 1) + (B - 1) l^n 
Therefore, the market average for column t satisfies 

*i{M,t) = 

< 



— n 



(B - l)n 



(B - l)n 


B 


B - 1 


B 


nB 


_ ct _ 


B 


a 



(13) 



B — 1 B B — 1 



B 



a 



a 



a 



Furthermore, any type-2 padding column has no adjustment value, which makes the market av- 
erage smaller than a type-1 column. Therefore, for either type of padding column the bound 
&\(M,t) < ^ is valid, and so it immediately follows that for any padding column t, since 
$i(.M,0) = $i(M fc ,0) =v u 



®i{M k ,t) 



> a 



*i(M,t) 



$i(Mk,0)- $i(M,0)' 

Therefore, the performance bound is met for all padding columns. 

Now consider a coding column t, and recall that we are assuming that at least one t>2 value 
from column t is included in the portfolio. This means that the portfolio average is ^\{A4k,t) > 
V2/k = v\. For the market average, we compute the sum over all values in the column, as we did 
before, and in this case we get 





B 




( 






_ Ci _ 





(n - zt)v 2 + A t = nv 2 - z t v 2 + {m — n) 
= nk(B - 1) + (nB - n] 
= nk(B -l) + (B-l)(n ^ - nk^j = (B - l)n 
Similar to the calculation for the padding columns, this gives us 



B - 1 
B 



< 



B-l 



a 



vi 
a 



> a 



$i(M,0)' 



(14) 



and so the performance bound is met for the coding columns as well. Therefore we have completed 
this direction of the proof. 
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For the other direction, we need to show that any portfolio that meets the performance bound 
must be made up of exactly k items from the coding rows and each coding column has at least 
one V2 value from the selected rows. We first show that any portfolio that meets the performance 
bound may only use coding rows. By our placement of adjustment values, we noticed before that 
there are at least k + 1 distinct padding rows that contain adjustment values. Therefore, there 
must be at least one type-1 padding column, say column t, that does not have its adjustment value 
At selected as part of the portfolio. Now if all k selections are not from the coding rows, then we 
can bound the portfolio average for column t by 

Since this is a type-1 column, ( |l3|) gives the market average, and we can further use ( |l2| ) to conclude 
that 

*i(A4 fc ,t) $i(M,0) {J ^ L vi (fc-l)QB-l) 1 _ 

k (-B-1) B fc-l a ' 



$i(A4fc,0) $i(M,t) ~ Vl 



Vl 


(B-l) 


B 


B 


a 



B a k 



and so the performance bound would not be met. Therefore, all k row selections must come from 
the coding rows. 

Since we have established that all k selections must come from the coding rows, we will next 
show that every column in the coding region must have at least one V2 value among the selected 
rows. This is, in fact, very easy to see — if no v% values are selected in a particular column, then the 
portfolio average is zero, which cannot meet the performance bound for that column. Therefore, all 
coding columns must be contain at least one V2 value, which completes this direction of the proof, 
and also completes the entire proof. | 



3.2.3 Guarantees on Volatility 

Lemma 3.5 // the performance bound is met for our constructed portfolio selection problem, then 
the volatility bound is met as well. 



Proof: Assume we have a solution that meets the performance bounds. Then by Lemma 3.4 we 
know that all k selected rows are coding rows and that each coding column contains at least one 
V2 value. From this information, we can bound the volatility of both the market and the portfolio. 

The first observation is that the portfolio average is exactly v\ for every padding column, 
including column 0, and this constant average means that the portfolio volatility is exactly zero for 
all of the padding columns (so A(.Mfc,i) = for all t < P). Since the portfolio volatility is zero, 
the volatility bound is trivially met whenever t < P. 

For t > P we bound the market volatilities first. We have already computed the market averages 
for the type-1 columns (in (|l3|)) and for coding columns (in (14)), but we need to compute the 
market average for type-2 columns. Since there are exactly n values of v\ in a type-2 column, and 
there are m = nB total columns, the market average of a type-2 column is simply ^ L 
L '~ i We summarize all market averages below: 



n(ff-l) 
nB 



B 



B-l 



$ 1 (M,t) = { V 



B-l B 
B 



if t = 0; 

if t < P and t is odd; 
otherwise. 
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These values can then be used to compute the one-period returns for the market: 

-InB if i = 1: 



Ri(M,i) 



In 



In 







if 1 < i < P and i is odd; 
if i < P and i is even; 



if i > P. 



Recall that we are only interested in volatilities for times t > P, and from the above we can 
derive for t > P 

~R~i(M,t) = - In- 





B 


(1 


_ a _ 



t $i(M,0) t 

This market average return can be either positive or negative, depending on the value of a, so we 
consider these two situations separately. First, if a > 1, then B > ^ , and so Ri(A4,t) < 0, 
which implies that when i is even we have 



Rt{M,i) -R x (M,t) > Ri(M,i) = In 

On the other hand, if a < 1, then B < 
odd and greater than 1 we have 

Ri(M,i) -Ri(M,t) < Rt{M,i) = - In 



B ^ 

a 



=> (Ri(M,i) -Ri(M,tj) > (hi 
■^J , and so Ri(M,t) > 0, which implies that when i is 



(Ri(M,i) -i?i(M,t)) > (in 



Notice that in both cases, we have the same bound, and we can guarantee that this bound holds for 
at least y — 1 columns. Using this fact, we can bound the market volatilities for t > P as follows: 



Ai(AM) = \ 



> 



\ 



(f-0 (-LiJ) 



Since i < P + \S\, P > 2\S\, and P > 6, we can bound 



t- 1 



P-2 \ 1 



' P-2 
2(* - 1) 



In 



Ai(AM) 



In 



2 V fc 



2(t_i) — I' an d then use (|T^) to derive 



aW k-1 



B 




£? 




'B k-l\ 






> - In 








>im( 




~ 2 


_ ot _ 









> i In (fcC 4 /fl+i *zJL) = I i n ( fc (4//9) (fc _ i; 



2(5 







(15) 



Next, we will find an upper bound for the portfolio volatility. As mentioned before, the portfolio 
averages for t < P are constant values v\. For t > P, the portfolio averages are data dependent, 
but we can certainly bound them by the closed interval 

*i(M k ,t) 6 [j,v 2 ] = [B — 1, k(B — 1)]. 



12 



Using this bound, we can bound the one-period portfolio returns by 



ln ^M^ 6|ln ^ = ± T , ln *(- B - 1 ) 



$i(Mk,t- 1) " L "~k(B- 1)'"* B-l 
and we can also bound the portfolio's average return by 



In k, In k], 



[0,-lnk]. 



t *i(.Mfc,0) L t B-l't B-l 
Given these bounds, the largest possible value for (Ri(A4k,i) — Ri(Mk,t)) 2 is hikj , and so 



A 1 (M k ,t) = \ 



E-=i (Ri(Mk,i) - Ri{M k ,t) 
^1 



< 



\ 



i+l 



t - 1 



In 



'(t+l) 2 
t(t-l) 



In fc. 



(16) 



0, 



Finally, since t>P + l>2t + l>3, we can bound 

Ai(Mk,t) < 21nfe. 

Combining ( p^) and ( |l6|) we get 

Ax(Mk,t) 2lnk 
At(M,t) < I Ink ~ 

and so the volatility bounds are met. | 
3.2.4 The main result 

Theorem 3.2 Let a and (3 be values expressed using bits in fixed-point binary notation, and 

satisfying < a < and (3 = Q ■ Then the problem of sacrificing return for less volatility 

using the price-weighted index is NP-complete. 



Proof: Follows immediately from Lemmas 3^, and 3J3. | 



3.3 Outperforming an index 

Given the results of the previous section, showing that the problem of outperforming an index is 



NP-complete is trivial. In particular, we use the exact same construction as in Section 3.2 (for 
concreteness in the construction, use /3 = 4), and then our result follows from direct application of 



Lemmas 3.3 and 3.4 



Theorem 3.3 Let e be any value satisfying < e < n c for some constant c. Then the problem of 
outperforming the market average using the price- weighted index with bound e is NP-hard. 



We note here that the construction of Section |3.2| gives us a slightly stronger result: We can 
actually let e be as small as — 1 + 2 _n0<1) . However, the disadvantage of using this reduction is that 
it is in fact more complicated than necessary for this problem — a direct, and simpler, reduction 
for the problem of outperforming an index is given in the appendix. 
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4 Other Indices 



For the value-weighted and equal-weighted indices, we will, in fact, use the exact same constructions 
as in the previous section — the prices in the constructed problem have been selected carefully so 
that they work using related indices, such as the value- weighted and equal- weighted indices. The 
results will follow fairly easily from the following lemma. 



Lemma 4.1 Let &j(B,t) be an index function where S^o = c f or some constant c implies that 



*i(B,0) 



d ■ ^(B,t) 



(17) 



for all sets of stocks B C M, where d is a constant that does not depend on B or t. Then all of the 
previous NP- completeness results hold for index <frj(B,t). 

Proof: Note that in all the problem statements, whenever an index value is used, it is always used 
in a ratio with the same index function, either at a different time step or for a different set of stocks. 
This will allow us to cancel out common factors, and the resulting problem will be in terms of the 
price-weighted index (<J>i(23, t)). For example, in considering the tracking problem, we need to have 
a subset M. k of k stocks such that for all t = 1, . . . , /, 



%(A4,0)' 



Due to the condition of equation (|l~7|) , this bound is met if and only if 



d-$j(M fc ,0)-$i(M fe ,t) rf-$j(M,0)-$i(AM) 



d-^ j (M k ,0)-^ 1 (M k ,0) d-$i(A4,0).$i(M,0) 



< e^ 



d-$j(M,0) -$i(M,t) 
'd-$j(M,0) ■ $i(M,oy 



and cancelling common terms we see that this is met if and only if 



&i(M k ,0) $i(M,0) 



< ei 



*i(M,t) 



Therefore, the tracking problem using the $j index function is entirely equivalent to the problem 
using the 3>i index function. 

Exactly the same derivation can be performed on the Problem 2 condition (|3|) , on the definition 
of Rj(B,t), and on the Problem 3 performance bound (Q). Therefore, all of these problems are 
equivalent to using the price-weighted index, and our previous reductions apply. | 



4.1 The Value- Weighted Index 



We first apply Lemma |4.1| to the value-weighted index. For the value-weighted index, we must 
indicate the weights (the lOj's) in the constructed portfolio selection problem as well as the prices. 
In all of our constructions, we will pick Wi = 1 for all i. 

If Sifi = c for some constant c, then for any valid time t and any set of stocks B, using uii = 1 
gives 

$ 2 ( B)t ) = £jU™i' 5 M = = g|=jAt = l$( Bt y 

YLwSifl Elic be c 



Furthermore, regardless of B we have &2{&,0) = 1, and so Lemma LI holds with constant d 
The following three theorems are a direct consequence of this Lemma. 
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Theorem 4.1 Let e be any error bound satisfying < e < 1 and specified using bits in fixed 

point notation. Then the tracking problem for a value-weighted index with error bound e is NP-hard. 



Theorem 4.2 Let e be any value satisfying < e < n c for some constant c. Then the problem of 
outperforming the market average using the value-weighted index with bound e is NP-hard. 



Theorem 4.3 Let a and j3 be values expressed using bits in fixed-point binary notation, and 

satisfying < a < and j3 = f j^r^) ■ Then the problem of sacrificing return for less volatility 

using the value-weighted index is NP-complete. 



4.2 The Equal- Weighted Index 

If Sift = c for all i, then 



*s(*. t) = £ fa = £ ^ = \ £ s itt = h - t) 

i=l '> u i= i i=i 



It's easy to see that $3(8, 0) = b, so 

* 3 (g,t) _ 1 
* 3 (B,0) c 



and so Lemma applies with constant d = -. The following three theorems are direct conse- 
quences of that Lemma. 

Theorem 4.4 Let e be any error bound satisfying < e < 1 and specified using bits in fixed 

point notation. Then the tracking problem for a equal-weighted index with error bound e is NP-hard. 



Theorem 4.5 Let e be any value satisfying < e < n c for some constant c. Then the problem of 
outperforming the market average using the equal-weighted index with bound e is NP-hard. 



Theorem 4.6 Let a and j3 be values expressed using bits in fixed-point binary notation, and 

satisfying < a < and (3 = Q f j^jjr^) • Then the problem of sacrificing return for less volatility 

using the equal-weighted index is NP-complete. 

4.3 The Price-Relative Index 

The price-relative index is a geometric mean of the values in a set of stocks, whereas our first index 
(the price-weighted index) is the arithmetic mean. In this section we will show that, at least for the 
first two problems, we can transform the reductions for the price-weighted index into reductions 
for the price-relative index, and thus obtain NP-hardness results for the price-relative index. For 
the second problem (outperforming an index), we use the simpler reduction given in the appendix. 
We will use the notation (S, e, $j) to denote an instance of a portfolio selection problem with prices 
Sij, error bound e, and index function 

The first step in transforming the reductions for the price-relative index is to change them so 
that every column, including the control column, has the same market average. If c\,C2, ■ ■ ■ ,c n 
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are the column sums of columns 1 through re, then let c = LCM(c\, . . . , c n ) be the least common 
multiple of these sums. We create a new set of prices by setting S' it = f-Sij at all times t > 1. 
Now the sum down column i is 

Si,t = <Si,t = = • Q = c, 

t t Ci Ci t Ci 

which is independent of the actual column, so all columns will now have the same average value 
(so &i(M,ti) = <J>i(.M, £2) for all times t\ and £2)- And finally, since the first two problems treat 
columns independently and the bounds are relative error bounds, if all values in a particular column 
are multiplied by a particular value, this "scaling up" does not change whether or not the error 
bound is met. Therefore, for problem 1 or problem 2, the instance (Sit, e, ^1) satisfies the bound 
if and only if the instance (S^ t , e, <&i) satisfies the bound. 

The next step in transforming the reductions is to change all the S^ t values into new values 

S" t = 2 S '> t for t > 1, while keeping S-'q = 1 for all i. The result of this is that for any set of b stocks 
£>, and any t > 1, 

\i=l ifi / \i=l / 

We will also need to transform the e values, but this is done differently for the two problems, and 
so is handled separately below. 

Theorem 4.7 Let e be any error bound satisfying < e < 1 and specified using O(logn) bits in 
fixed point notation. Then the tracking problem for a price-relative index with error bound e is 
NP-hard. 

Proof: Let e' = S j~ e , where m is the number of stocks in the entire market (or the number of 

c/m ' v 

rows in our table), and c is the common column sum as described above in the transformation from 
S to 5". Now we show that (S"',e,$4) satisfies the tracking lower bound if and only if (S, e',<J>i) 
does: 

(l-e f )^(M,t) 
*i(M,*) + lg(l-e) 

2 *l(X,t)+lg(l-e) 

(1 _ e ) 2 *i(Mt) 
(l-e)^(M,t) 

Furthermore, since the Sij values come from the reduction for Theorem |3.1| , the tracking upper 
bound is trivially met for (5", e, $4) just like it is trivially met for (S, e', $1) (all acceptable portfolio 
averages are in fact less than the market average). 



$i(M k ,t) 


> 


&i(M k ,t) 


> 


*i(M k ,t) 


> 


$i{M k ,t) 


> 


$i(M k ,t) 


> 


2*i(A4 fc ,i) 


> 


2®i(M k ,t) 


> 


$ 4 (M k ,t) 


> 
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Therefore, (S", e, $4) satisfies the tracking bound (both upper and lower) if and only if (S, e', $1) 
does, and so we can use (S" , e, $4) in the reduction for the tracking problem in place of (S, e', <3?i). 



and the validity of the reduction for (S , e, #4) follows directly from the results of Theorem 3.1 



Examining the number of bits required for the various values in the reduction, we get the NP- 
completeness result stated in the theorem. | 



Theorem 4.8 Let e be any value satisfying < e < n c for some constant c. Then the problem of 
outperforming the market average using the price-relative index with bound e is NP-hard. 

Proof: Similar to the derivation in the previous theorem, except we use e' = • I 

Finally, we end this section by noting that our final problem, sacrificing return for less volatility, 
does not have independent column values as problems 1 and 2 did, and so the above transformation 
idea does not work. We leave the complexity of the combination of price-relative index and problem 
3 as an open problem. 
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Figure 3: Pictorial depiction of reduction for Theorem A.l 

A Direct construction for outperforming an index 

We now turn our attention to the problem of finding a portfolio that outperforms the market 
average at every time step. In particular, we are looking for a portfolio Mk of size k which satisfies 
(|3|). As we did in the first construction (for tracking an index), we rewrite this condition as follows: 

$i(M,0) $i(M k ,t) 



$i(.M fc ,0) $i(M,t) 



>l + e. (18) 



Theorem A.l Let e be any value satisfying < e < n c for some constant c. Then the problem of 
portfolio selection for outperforming the market average with bound e is NP-hard. 

Proof: The reduction used in this proof is shown pictorially in Figure ||. The indicator variables 
in this case are simple zero and one values (set to one if and only if the element represented by 
that row is in the subset represented by that column). The adjustment row contains values so that 
each column except the control column has sum n. This is clearly possible for each column, using 
only integer values between and n. We also again use an initial column of all ones, which reduces 
condition (|l~8|) to just 

$i(M k ,t) 
*i(M,t) ~ + ' 

We first show that the required bound is met for the control column if and only if the selected 
portfolio is made up entirely of rows from the first kn rows (i.e., those rows that contain a 1 in 
the control column). In particular, the adjustment row may not be included in the portfolio. The 
market average for the control column is simply 

$l(M,t) = 77- rr-^r. 

u ; [(l + e)fcnl 
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Obviously, when the portfolio Mk is made up entirely of these rows, the portfolio average in the 
control column is 1, so we can bound 



&i{M k ,t) _ \(l + e)kn\ 
&i(M,t) kn ~ ' 

On the other hand, when only k — 1 or fewer of the portfolio rows begin with a 1, then the portfolio 
average is at most 1 — ^, and so we can bound 

*i(M k ,t) < / 1\ r(l + e)fcn] < / 1\ (l + e)fen+l 



$i(M,t) \ k) kn \ k) kn 

1 1\ (, 1 \ 1 lie 1 

n — 1 e 1 
= 1 + e + -^n--k~k^ <1 + e - 

Therefore, the desired bound is met only if all k selected rows begin with a 1. 

We next show that the desired bound for all other columns is met if and only if at least one 
row must be selected that contains a non-zero value. If no such rows are selected, all selected rows 
contain and so the portfolio average is 0. This clearly cannot meet our required bound. On 
the other hand, if even one row is included with a non-zero value, then &i(Mk,t) > |, while the 
market average for this column is clearly p ( 1+ ") fcn -| ■ This leads to 

*i(M k ,t) 1 + e)kn] 
^!(M,t) ~ k n ~ +C ' 

and so the desired bound is met. We note that in order to meet the desired bound on all columns, 
the adjustment row must not be selected, and therefore the non-zero value required in each column 
of the portfolio must come from the indicator variables of the original set cover problem. Therefore, 
an acceptable portfolio exists if and only if an acceptable set cover exists. E 
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